Skip to content

Rationalize and try to fix failing ldiv tests #2809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kshyatt
Copy link
Member

@kshyatt kshyatt commented Jul 2, 2025

Trying to fix intermittently failing CI. Doesn't make sense to have these checks for only one of the inplace/not-inplace versions. Hopefully this helps stability.

@kshyatt kshyatt requested a review from maleadt July 2, 2025 17:17
@kshyatt kshyatt added cuda libraries Stuff about CUDA library wrappers. tests Adds or changes tests. labels Jul 2, 2025
Copy link
Contributor

github-actions bot commented Jul 2, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/test/libraries/cusparse/interfaces.jl b/test/libraries/cusparse/interfaces.jl
index fa25d8330..34f9d75f8 100644
--- a/test/libraries/cusparse/interfaces.jl
+++ b/test/libraries/cusparse/interfaces.jl
@@ -258,7 +258,7 @@ nB = 2
                                 end
                             end
                             @testset "\\ -- CuMatrix" begin
-                                C  = triangle(opa(A)) \ opb(B)
+                                C = triangle(opa(A)) \ opb(B)
                                 dC = triangle(opa(dA)) \ opb(dB)
                                 @test C ≈ collect(dC)
                                 if CUSPARSE.version() < v"12.0"

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: ef0395c Previous: 4f38802 Ratio
latency/precompile 42834310121 ns 42824416725 ns 1.00
latency/ttfp 7055511558 ns 7051266950 ns 1.00
latency/import 3581529444 ns 3574411987 ns 1.00
integration/volumerhs 9609883 ns 9608389 ns 1.00
integration/byval/slices=1 146843 ns 146872 ns 1.00
integration/byval/slices=3 425595 ns 425794 ns 1.00
integration/byval/reference 145040 ns 144942 ns 1.00
integration/byval/slices=2 286222 ns 286144 ns 1.00
integration/cudadevrt 103486 ns 103388 ns 1.00
kernel/indexing 14275 ns 14276 ns 1.00
kernel/indexing_checked 14903 ns 15083 ns 0.99
kernel/occupancy 665.1635220125786 ns 677.6114649681529 ns 0.98
kernel/launch 2084.9 ns 2157.8888888888887 ns 0.97
kernel/rand 15789 ns 14900 ns 1.06
array/reverse/1d 19840 ns 20028 ns 0.99
array/reverse/2d 24759 ns 25007 ns 0.99
array/reverse/1d_inplace 10572 ns 10952 ns 0.97
array/reverse/2d_inplace 12164 ns 12545 ns 0.97
array/copy 21058 ns 21084 ns 1.00
array/iteration/findall/int 156440 ns 158043.5 ns 0.99
array/iteration/findall/bool 139111 ns 140007 ns 0.99
array/iteration/findfirst/int 161781 ns 164557.5 ns 0.98
array/iteration/findfirst/bool 163534.5 ns 167385 ns 0.98
array/iteration/scalar 71885 ns 74295 ns 0.97
array/iteration/logical 211491.5 ns 215875.5 ns 0.98
array/iteration/findmin/1d 46205 ns 47331 ns 0.98
array/iteration/findmin/2d 96280 ns 97017 ns 0.99
array/reductions/reduce/Int64/1d 41975.5 ns 43072.5 ns 0.97
array/reductions/reduce/Int64/dims=1 45421 ns 55698.5 ns 0.82
array/reductions/reduce/Int64/dims=2 61650 ns 62572.5 ns 0.99
array/reductions/reduce/Int64/dims=1L 88841 ns 89129 ns 1.00
array/reductions/reduce/Int64/dims=2L 87080 ns 88184.5 ns 0.99
array/reductions/reduce/Float32/1d 34328 ns 35313 ns 0.97
array/reductions/reduce/Float32/dims=1 44565.5 ns 51818 ns 0.86
array/reductions/reduce/Float32/dims=2 59597 ns 59835 ns 1.00
array/reductions/reduce/Float32/dims=1L 52401 ns 52336 ns 1.00
array/reductions/reduce/Float32/dims=2L 70127 ns 70233.5 ns 1.00
array/reductions/mapreduce/Int64/1d 42310 ns 44093 ns 0.96
array/reductions/mapreduce/Int64/dims=1 46824 ns 47633.5 ns 0.98
array/reductions/mapreduce/Int64/dims=2 61836 ns 62709 ns 0.99
array/reductions/mapreduce/Int64/dims=1L 89032 ns 89036 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 87169 ns 87347.5 ns 1.00
array/reductions/mapreduce/Float32/1d 34370 ns 34780.5 ns 0.99
array/reductions/mapreduce/Float32/dims=1 41835 ns 41996.5 ns 1.00
array/reductions/mapreduce/Float32/dims=2 60443 ns 60450.5 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52707 ns 52739 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 70578 ns 70715 ns 1.00
array/broadcast 20213 ns 20360 ns 0.99
array/copyto!/gpu_to_gpu 12757 ns 12890 ns 0.99
array/copyto!/cpu_to_gpu 215578 ns 217680 ns 0.99
array/copyto!/gpu_to_cpu 284650.5 ns 286671 ns 0.99
array/accumulate/Int64/1d 124951 ns 125190 ns 1.00
array/accumulate/Int64/dims=1 83369 ns 84136 ns 0.99
array/accumulate/Int64/dims=2 157877 ns 158690 ns 0.99
array/accumulate/Int64/dims=1L 1710070 ns 1709534 ns 1.00
array/accumulate/Int64/dims=2L 966339 ns 967437 ns 1.00
array/accumulate/Float32/1d 109036.5 ns 109803 ns 0.99
array/accumulate/Float32/dims=1 80422.5 ns 81170 ns 0.99
array/accumulate/Float32/dims=2 147586 ns 147834 ns 1.00
array/accumulate/Float32/dims=1L 1618609 ns 1619112.5 ns 1.00
array/accumulate/Float32/dims=2L 698220 ns 698583 ns 1.00
array/construct 1284.9 ns 1275.8 ns 1.01
array/random/randn/Float32 43542.5 ns 44761 ns 0.97
array/random/randn!/Float32 24899 ns 25104 ns 0.99
array/random/rand!/Int64 27523 ns 27468 ns 1.00
array/random/rand!/Float32 8803.666666666666 ns 8662 ns 1.02
array/random/rand/Int64 38172 ns 30080 ns 1.27
array/random/rand/Float32 13130 ns 13152 ns 1.00
array/permutedims/4d 60326.5 ns 60473 ns 1.00
array/permutedims/2d 53930 ns 54524 ns 0.99
array/permutedims/3d 54833 ns 55468 ns 0.99
array/sorting/1d 2757985 ns 2763710 ns 1.00
array/sorting/by 3344404 ns 3356377 ns 1.00
array/sorting/2d 1080451 ns 1085339 ns 1.00
cuda/synchronization/stream/auto 1026.7857142857142 ns 1018.0909090909091 ns 1.01
cuda/synchronization/stream/nonblocking 8057.6 ns 7602.700000000001 ns 1.06
cuda/synchronization/stream/blocking 795.1456310679612 ns 806.236559139785 ns 0.99
cuda/synchronization/context/auto 1172.1 ns 1183.8 ns 0.99
cuda/synchronization/context/nonblocking 8369.599999999999 ns 7801 ns 1.07
cuda/synchronization/context/blocking 914.7560975609756 ns 897.2923076923076 ns 1.02

This comment was automatically generated by workflow using github-action-benchmark.

@kshyatt kshyatt force-pushed the ksh/interfaces_fix branch from 5db6744 to ef0395c Compare July 23, 2025 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda libraries Stuff about CUDA library wrappers. tests Adds or changes tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants